3.6 k-fold Cross-Validation and the Bias-Variance Trade-off

まとめ（最後の段落）

when increasing the number of folds or k:

「foldの数kを増加させた時」

The bias of the performance estimator decreases (more accurate)

「汎化性能見積もりのbiasは小さくなる（より正確になる）」

The variance of the performance estimators increases (more variability)

「汎化性能見積もりのvarianceは増加する（より変わりやすくなる）」

The computational cost increases (more iterations, larger training sets during fitting)

「計算コストは増加する」

Exception: decreasing the value of k in k-fold cross-validation to small values (for example, 2 or 3) also increases the variance on small datasets due to random sampling effects.

「例外として、kを例えば2や3まで小さくすると、ランダムサンプリングの影響のために小さいデータセットで（汎化性能見積もりの？）varianceが増加する」

（IMO：2.2 Figure 5の話ではないか）

we may prefer LOOCV over single train/test splits via the holdout method for small and moderately sized datasets.

「小さく控えめなサイズのデータセットには、ホールドアウト法で単にtrain/testと分割するよりもLOOCVを好むかもしれない」

we can think of the LOOCV estimate as being approximately unbiased:

「LOOCVによる推定は近似的にバイアスがないと考えられる」

訓練データ数をnとすると、n-1個のデータ（多くのデータ）でモデルを訓練しているため

one downside of using LOOCV over k-fold cross-validation with k < n is the large variance of the LOOCV estimate

「k < nとなるk交差検証よりもLOOCVを使うことの否定的な側面の1つはLOOCVの見積もりのvarianceが大きいということだ」

0か1かのlossのような非連続なロス関数を使用する時、LOOCVには欠陥がある

テストセットが1サンプルだけなのでvarianceが大きくなる（先行研究で指摘）

fold間のvarianceについて正しい

予測をベルヌーイ試行と考える（1.7 Confidence Intervals via Normal Approximation）

正しい予測数Xは二項分布 B(n, p)に従う

この二項分布のvarianceは np(1 - p)

We can estimate the variability of a statistic (here: the performance of the model) from the variability of that statistic between subsamples

「統計（ここではモデルの汎化性能）の変わりやすさをサブサンプル間の統計の変わりやすさから見積もることができる」

Now, when we are talking about the variance of LOOCV, we typically mean the difference in the results that we would get if we repeated the resampling procedure multiple times on different data samples from the underlying distribution.

「LOOCVのvarianceについて話すときは、典型的には、隠れた分布から異なるデータサンプルを複数回リサンプリングすることを繰り返して得られた結果の差分を意味するとする」

LOOCVのn個の訓練セットは互いに似ているので高いvarianceを持つ（Hastie et al. 2009）

別の説明として「非常に相関した変数群の平均は、あまり相関していない変数群の平均よりも、より高いvarianceを持つ」

the mean of highly correlated variables has a higher variance than the mean of variables that are not highly correlated.

共分散(covariance)と分散(variance)を使って説明（TODO 大学で聞いた気がする）

kはいくつがいいのか

Kohavi’s experiments on various real-world datasets suggest that 10-fold cross-validation offers the best trade-off between bias and variance.

「様々な実世界のデータセットについてのKohaviの実験は、k=10交差検証がbiasとvarianceの間のもっともよいトレードオフを提供すると提案する」

Kohavi 1995 A study of cross-validation and bootstrap for accuracy estimation and model selection

k交差検証を繰り返すことで、小さいbiasを維持したまま見積もりの精度を増加できるという研究もある